System and method for performing speech analytics with objective function and feature constaints

ABSTRACT

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for performing trend analysis of speech. A system practicing the method receives a speech trend analysis request having candidate feature constraints, an objective function with respect to a speech trend to be analyzed, and a set of speech record constraints. The system selects a subset of speech records from the group of speech records based on the set of speech record constraints to yield selected speech records, identifies features in the selected speech records based on the set of candidate feature constraints to yield identified features, and assigns a weight to each of the identified features based on the objective function. Then the system ranks the identified features by their respective weights to yield ranked identified features, and outputs at least one of the ranked identified features associated with a speech-based trend in response to the speech trend analysis request.

PRIORITY INFORMATION

The present application is a continuation of U.S. patent applicationSer. No. 12/895,337, filed Sep. 30, 2010, the contents of which isincorporated herein by reference in its entirety.

BACKGROUND

1. Technical Field

The present disclosure relates to speech analytics and more specificallyto a flexible, adaptive approach to speech analytics.

2. Introduction

Speech analytics is a form of speech data mining beyond meretranscription and voice search. One working definition of speechanalytics is the synthesis of actionable information from multipleconversations, such as a real-time or recorded set of conversationsbetween call center employees and customers. Current approaches tospeech analytics include packages that analyze speech with a single taskor problem in mind. Thus, analysts must use a patchwork of differenttools and guesswork every time they have a new analytics problem.

SUMMARY

Additional features and advantages of the disclosure will be set forthin the description which follows, and in part will be obvious from thedescription, or can be learned by practice of the herein disclosedprinciples. The features and advantages of the disclosure can berealized and obtained by means of the instruments and combinationsparticularly pointed out in the appended claims. These and otherfeatures of the disclosure will become more fully apparent from thefollowing description and appended claims, or can be learned by thepractice of the principles set forth herein.

The architecture and approaches disclosed herein enable an analyst tosolve many analytics problems intuitively and directly, for whichseparate systems were required in the past. One common application ofspeech analytics is in customer service call centers, but otherapplications include emergency services hotlines, crisis interventioncenters, polling organizations, and outbound telephone survey companies.This system makes analysts' jobs easier and more effective by using auniform representation for a large class of analytics problems togetherwith an intuitive user interface. A speech analytics system is only asuseful as an analyst's ability to understand, navigate, and control thesystem.

Speech analytics is one way to gather business intelligence from largedata sets of speech. The speech can be any set of speech, such asconversations between two or more speakers, between one speaker and aninteractive voice response (or other automated) system, or a monologuesuch as a classroom lecture. Organizations can use such intelligencegenerated from speech analytics to cut costs, discover salesopportunities, improve marketing campaigns, and so forth. One specificexample of how speech analytics in call centers can be valuable isdiscovering ways to improve customer service based on customersatisfaction surveys, such as by coaching service agents. Other examplesinclude discovering ways to reduce average call handling time therebyreducing labor costs, discovering conversation patterns correlated withupselling/cross-selling or lack thereof, predicting product demand forinventory planning such as calls to a department store asking “Do youcarry Product X?”, and discovering problems that many customers arecalling about in order to reduce call volume by preemptively solvingthose problems. Many other applications of speech analytics existbesides these examples.

Disclosed are systems, methods, and non-transitory computer-readablestorage media for performing speech trend analysis. A system performingthe method first receives, as part of a speech trend analysis request, aset of candidate feature constraints, an objective function with respectto a speech trend to be analyzed, and a set of speech record constraintsto be applied to a group of speech records. Then the system selects asubset of speech records from the group of speech records based on theset of speech record constraints to yield selected speech records, andidentifies features in the selected speech records based on the set ofcandidate feature constraints to yield identified features. The systemcan further assign a weight to each of the identified features based onthe objective function, rank the identified features by their respectiveweights to yield ranked identified features, and output at least one ofthe ranked identified features associated with a speech-based trend inresponse to the speech trend analysis request.

Also disclosed are systems, methods, and non-transitorycomputer-readable storage media for generating an alert based on speechanalytics data. A system practicing this method first generates elementsof a time series. Each element can include speech records havingtimestamps within a same time interval. The system then generates anumeric value for each element in the time series based on a weight foreach speech record, an objective function with respect to a trend to beanalyzed, and a set of record constraints to be applied to a group ofrecords. The system then generates an alarm when at least one respectivenumeric value for at least one element in the time series meets athreshold.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which the above-recited and otheradvantages and features of the disclosure can be obtained, a moreparticular description of the principles briefly described above will berendered by reference to specific embodiments thereof, which areillustrated in the appended drawings. Understanding that these drawingsdepict only exemplary embodiments of the disclosure and are nottherefore to be considered to be limiting of its scope, the principlesherein are described and explained with additional specificity anddetail through the use of the accompanying drawings in which:

FIG. 1 illustrates an example system embodiment;

FIG. 2 illustrates a flow diagram of an exemplary speech analyticssystem;

FIG. 3 illustrates an exemplary trend analysis graphical user interface;

FIG. 4 illustrates an example method embodiment for performing trendanalysis of speech; and

FIG. 5 illustrates an example method embodiment for generating an alertbased on speech analytics data.

DETAILED DESCRIPTION

Various embodiments of the disclosure are discussed in detail below.While specific implementations are discussed, it should be understoodthat this is done for illustration purposes only. A person skilled inthe relevant art will recognize that other components and configurationsmay be used without parting from the spirit and scope of the disclosure.

The present disclosure addresses the need in the art for more flexible,adaptive, and extensible speech analytics systems. A brief discussion ofa basic general-purpose system or computing device in FIG. 1, which canbe employed to practice the concepts, is disclosed herein. Thedisclosure then turns to a discussion of a general system architectureof an exemplary speech analytics system and its various components anduser interfaces. The disclosure then turns to a discussion of trendanalysis and an alerting system. Finally, the disclosure turns to a moredetailed description of the exemplary method. The disclosure now turnsto the exemplary computing system of FIG. 1.

With reference to FIG. 1, an exemplary system 100 includes ageneral-purpose computing device 100, including a processing unit (CPUor processor) 120 and a system bus 110 that couples various systemcomponents including the system memory 130 such as read only memory(ROM) 140 and random access memory (RAM) 150 to the processor 120. Thesystem 100 can include a cache of high speed memory connected directlywith, in close proximity to, or integrated as part of the processor 120.The system 100 copies data from the memory 130 and/or the storage device160 to the cache for quick access by the processor 120. In this way, thecache provides a performance boost that avoids processor 120 delayswhile waiting for data. These and other modules can control or beconfigured to control the processor 120 to perform various actions.Other system memory 130 may be available for use as well. The memory 130can include multiple different types of memory with differentperformance characteristics. It can be appreciated that the disclosuremay operate on a computing device 100 with more than one processor 120or on a group or cluster of computing devices networked together toprovide greater processing capability. The processor 120 can include anygeneral purpose processor and a hardware module or software module, suchas module 1 162, module 2 164, and module 3 166 stored in storage device160, configured to control the processor 120 as well as aspecial-purpose processor where software instructions are incorporatedinto the actual processor design. The processor 120 may essentially be acompletely self-contained computing system, containing multiple cores orprocessors, a bus, memory controller, cache, etc. A multi-core processormay be symmetric or asymmetric.

The system bus 110 may be any of several types of bus structuresincluding a memory bus or memory controller, a peripheral bus, and alocal bus using any of a variety of bus architectures. A basicinput/output (BIOS) stored in ROM 140 or the like, may provide the basicroutine that helps to transfer information between elements within thecomputing device 100, such as during start-up. The computing device 100further includes storage devices 160 such as a hard disk drive, amagnetic disk drive, an optical disk drive, tape drive or the like. Thestorage device 160 can include software modules 162, 164, 166 forcontrolling the processor 120. Other hardware or software modules arecontemplated. The storage device 160 is connected to the system bus 110by a drive interface. The drives and the associated computer readablestorage media provide nonvolatile storage of computer readableinstructions, data structures, program modules and other data for thecomputing device 100. In one aspect, a hardware module that performs aparticular function includes the software component stored in anon-transitory computer-readable medium in connection with the necessaryhardware components, such as the processor 120, bus 110, display 170,and so forth, to carry out the function. The basic components are knownto those of skill in the art and appropriate variations are contemplateddepending on the type of device, such as whether the device 100 is asmall, handheld computing device, a desktop computer, or a computerserver.

Although the exemplary embodiment described herein employs the hard disk160, it should be appreciated by those skilled in the art that othertypes of computer readable media which can store data that areaccessible by a computer, such as magnetic cassettes, flash memorycards, digital versatile disks, cartridges, random access memories(RAMs) 150, read only memory (ROM) 140, a cable or wireless signalcontaining a bit stream and the like, may also be used in the exemplaryoperating environment. Non-transitory computer-readable storage mediaexpressly exclude media such as energy, carrier signals, electromagneticwaves, and signals per se.

To enable user interaction with the computing device 100, an inputdevice 190 represents any number of input mechanisms, such as amicrophone for speech, a touch-sensitive screen for gesture or graphicalinput, keyboard, mouse, motion input, speech and so forth. An outputdevice 170 can also be one or more of a number of output mechanismsknown to those of skill in the art. In some instances, multimodalsystems enable a user to provide multiple types of input to communicatewith the computing device 100. The communications interface 180generally governs and manages the user input and system output. There isno restriction on operating on any particular hardware arrangement andtherefore the basic features here may easily be substituted for improvedhardware or firmware arrangements as they are developed.

For clarity of explanation, the illustrative system embodiment ispresented as including individual functional blocks including functionalblocks labeled as a “processor” or processor 120. The functions theseblocks represent may be provided through the use of either shared ordedicated hardware, including, but not limited to, hardware capable ofexecuting software and hardware, such as a processor 120, that ispurpose-built to operate as an equivalent to software executing on ageneral purpose processor. For example the functions of one or moreprocessors presented in FIG. 1 may be provided by a single sharedprocessor or multiple processors. (Use of the term “processor” shouldnot be construed to refer exclusively to hardware capable of executingsoftware.) Illustrative embodiments may include microprocessor and/ordigital signal processor (DSP) hardware, read-only memory (ROM) 140 forstoring software performing the operations discussed below, and randomaccess memory (RAM) 150 for storing results. Very large scaleintegration (VLSI) hardware embodiments, as well as custom VLSIcircuitry in combination with a general purpose DSP circuit, may also beprovided.

The logical operations of the various embodiments are implemented as:(1) a sequence of computer-implemented steps, operations, or proceduresrunning on a programmable circuit within a general use computer, (2) asequence of computer-implemented steps, operations, or proceduresrunning on a specific-use programmable circuit; and/or (3)interconnected machine modules or program engines within theprogrammable circuits. The system 100 shown in FIG. 1 can practice allor part of the recited methods, can be a part of the recited systems,and/or can operate according to instructions in the recitednon-transitory computer-readable storage media. Such logical operationscan be implemented as modules configured to control the processor 120 toperform particular functions according to the programming of the module.For example, FIG. 1 illustrates three modules Mod1 162, Mod2 164 andMod3 166 which are modules configured to control the processor 120.These modules may be stored on the storage device 160 and loaded intoRAM 150 or memory 130 at runtime or may be stored as would be known inthe art in other computer-readable memory locations.

Having disclosed some basic components of an exemplary computing device,the disclosure returns to speech analytics. One working definition ofspeech analytics, as described above, is the synthesis of actionableinformation from multiple conversations, such as a real-time or recordedset of conversations between call center employees and customers. Whatconstitutes actionable information varies from application toapplication. A speech analytics system can also provide for speechretrieval and browsing. For example, a speech retrieval system can helpan analyst find conversations where certain words occur in a speechdatabase, and browse to and play back a particular part of theconversation where those words occur. This functionality can save theanalyst inestimable hours of listening to irrelevant conversations aswell as irrelevant parts of relevant conversations.

With this in mind, the discussion now turns to the exemplary speechanalytics system architecture 200 as shown in FIG. 2. Speech analyticssystems vary widely in their level of sophistication. Some offer littlemore than speech browsing and retrieval. Others present additionallayers of functionality, which build one on top of another. One examplefoundational analytics layer is feature extraction. Before a speechanalytics system can do anything with its collection of conversations,it must extract various pieces of information (“features”) from eachconversation. Speech analytics systems use a variety of features toinform their most basic function—retrieving conversations that matchcertain criteria. Thus, a retrieval layer can be built upon the featureextraction layer.

A browsing layer can be built on the retrieval layer. The browsing layeroffers analysts a way to see and/or listen to examples of the kinds ofdata, such as retrieved conversations, that contributed to salientaspects of its analyses. The actual analytics functionality of a speechanalytics system begins with the next layer, trend analysis, which isalso built on the retrieval layer. Finally, a subsystem performing thework of an alerting layer can be built on top of the trend analysislayer and/or the retrieval layer.

The top half of FIG. 2, above the hexagon representing the analyst 226,illustrates the browsing and retrieval functionality layers of a speechanalytics system. In FIG. 2, the rectangles represent data and ovalsrepresent process modules. In one aspect, the transcription 204, textfeature extraction 208, acoustic feature extraction 210, and relationalfeature extraction 216 process modules run regardless of whether ananalyst 226 is using the system 200 at the time. The system 200 convertsthe original conversation data 202 to a format suitable for audiostreaming 222 through a media player and stored on a media server 220for subsequent retrieval and playback upon request 248 from the analyst226. The transcription module 204 transcribes the conversations 202 toproduce a set of transcripts 206. The text feature extraction module 208extracts words and a variety of other information from the transcripts206 and adds the extracted information to a database 212. The textfeature extraction module 208 can also extract non-verbal informationfrom the conversations which can be used to inform the text featureextraction module 208. The relational feature extraction module 216 canextract metadata 214 pertaining to each conversation and add it to thedatabase 212 as well. Metadata, or information about the conversationsother than their actual contents, can play an important role in speechanalytics systems.

The analyst 226 typically initiates operation of the media server andplayer 220 and the database management system (DBMS) 218. The analyst226 can initiate a speech retrieval session by issuing a query 224 tothe DBMS 218. The DBMS 218 returns the relevant records 246 that satisfythe query 224. The system 200 presents the records 246 in a userinterface that allows the analyst 226 to “drill down” through the listof records returned to examine individual records and listen toindividual conversations. The media server and player 220 support thisbrowsing and playback functionality.

The first layer of analytics functionality can be implemented on top ofa speech browsing and retrieval system, merely by exploiting the usualcapabilities of the DBMS 218. Most DBMSs support some variation of theubiquitous SQL relational query language, such as commercial databasesavailable from Oracle, Microsoft, and IBM, as well as open sourcedatabases such as MySQL, PostgreSQL, and SQLite. Most flavors of SQLinclude primitives for sorting, grouping, counting, and averaging ofdatabase records. The rudimentary analyses supported by SQL enableanalysts to ask “what/which” questions about speech data, such as “Whichconversations involving customer service agent Jane talked about productXYZ?” and “What was the average length of conversations that discussedissue ABC?” Such questions are typically motivated by a specificbusiness problem that the analyst 226 is trying to solve, such as agentJane's understanding of product XYZ or the labor costs of resolvingissue ABC. However, in order to ask the right kinds of questions in atraditional implementation, the analyst 226 must already know what theproblem is.

In many situations, the analyst 226 is aware that a problem exists butdoes not know exactly what the problem is or how to look for it. In suchsituations, the analyst 226 can ask the system the more difficult “why”questions, such as “Why were our customers' assessments of our serviceagents so low last week?” or “Why is the average call duration so highin our Texas call centers?” The system 200 can answer such questionsusing the next layer of speech analytics functionality, represented bythe feature filtering module 242 and the trend analysis module 236. Thesystem 200 can mathematically formulate such questions as problems ofstatistical feature selection. In other words, the analyst 226 wants toknow which “features” of the data best explain one particular statisticof the data, such as call duration. This statistic is called theobjective function 234, which is provided as input to the trend analysismodule 236. A feature can be any piece of information that the system'sdatabase contains about each conversation 202. The number of potentialfeatures can be very large, so an analyst 226 can provide the system 200with additional guidance about the kinds of features that the analyst226 thinks might be relevant. Such guidance can be formulated as a setof “feature constraints” 240, which are fed to the feature filteringmodule 242.

The trend analysis module 236 then induces a model of how the selectedfeatures 244 correlate with the objective function 234, and the system200 reports the most highly correlated, or prominent, features 238 tothe analyst 226. These highly correlated features 238 are often called“drivers” in business contexts. Unfortunately, “drivers” is a misleadingmisnomer, because correlation is not the same as causation. Trendanalysis can be generalized from explaining one statistic to explaininghow one statistic varies with respect to another, such as time. Forexample, the analyst 226 can ask which features contribute to thefluctuation of call duration over time, or which features best explain aspike in call volume. In terms of data flow, the only additionalinformation necessary for this generalization is an additional objectivefunction 234. Thus, the analyst 226 can specify multiple objectivefunctions 234 for the trend analysis module 236.

All the modules and their functionality described so far either runindependently of the analyst 226 or are initiated by the analyst 226.The next layer of functionality, represented by the alerting systemmodule 230, lets the speech analytics system 200 take the initiative.The alerting system module 230 sends alerts 232 to the analyst 226 whenthe DBMS 218 meets certain conditions about which the analyst 226 isinterested. For example, the analyst 226 wants to be alerted wheneverthe mean call duration increases by 10% from one week to the next. Ifsuitably configured, the speech analytics system 200 can automaticallyperform the relevant trend analysis every week, and notify the analyst226 whenever the condition of interest is satisfied. Alternately, thespeech analytics system 200 can provide a report to the analyst 226summarizing the status and/or trend of the mean call duration even ifthe condition is not satisfied. The disclosure now turns to a morein-depth discussion of some of the modules in FIG. 2.

First, the disclosure addresses the transcription module 204. In somesituations, manual transcriptions are feasible, preferable, or evenmandated by law. For example, the proceedings of many courts andparliaments are manually transcribed, and others are recorded. In manyspeech analytics systems, an automatic speech recognition (ASR)transcription system performs transcription. In this case, the qualityof analyses done by the speech analytics system 200 greatly depends onthe accuracy of the transcriptions.

Building accurate ASR systems for call center recordings is oftenparticularly challenging because the vendors of call center recordingand archiving equipment aim to minimize storage requirements. Vendorstend to compress the audio to 6000 samples per second or less, whichmight still be enough for people to understand it. ASR systems are notnearly as good as people at understanding poor quality recordings, butcall center recording equipment is typically not designed for ASR.

Another challenge for any large-vocabulary ASR system used in productionis that language tends to evolve. For example, a company can introducenew products and services, whose names subsequently come up in callcenter recordings. Accordingly, the accuracy of an ASR system used by aspeech analytics system 200 will degrade over time, unless a userupdates the vocabulary. Even if an ASR system is updated regularly,delays between the introduction of an important new term, such as a newproduct name, and the ASR system's ability to recognize that term willstill occur.

One consideration when building ASR systems is what counts as good ASR.Most ASR systems are built and configured to optimize well-knownevaluation measures such as word error rate, but such evaluationmeasures are not the most relevant for many analytics purposes. Inparticular, for accurate speech retrieval, words that are likely to beused as search terms are more important than other words. Function wordsare unlikely to be search terms. When tuning the various parameters ofan ASR system for use in speech analytics, the ASR system builder canremove function words from consideration by the evaluation measure. Forexample, the system builder can delete function words from the system'shypotheses and from the reference transcriptions. The system 200 and/orthe analyst 226 can grade the remaining content words by theirlikelihood of being search terms, and their evaluation can be weightedaccordingly.

Another important trade-off in ASR systems is errors of omission versuserrors of commission, also known as deletion and insertion errors,respectively. The standard error measures treat the two types of errorequally, but insertion errors are much more damaging to most speechretrieval systems than deletion errors because listening toconversations is time-consuming. Analysts usually try to minimize thenumber of conversations that they listen to for the purpose of anyparticular analysis. Thus, on the one hand, retrieving irrelevantconversations wastes the analyst's time. On the other hand, if searchterms are obtained from the output of trend analysis as described below,then those terms are likely to appear in many conversations. Therefore,finding a few relevant conversations is typically not difficult, and sodeletion errors can be tolerated more easily. This trade-off betweeninsertion and deletion errors can be controlled by the word insertionpenalty parameter in most modern speech decoders. This trade-off can beoptimized given an objective evaluation measure, such as word errorrate.

The disclosure now turns to a more in-depth discussion of the textfeature extraction module 208. At a minimum, the text feature extractionmodule 208 records which words occur in which transcripts. To enableefficient searches for multi-word phrases, the database can use datastructures such as suffix trees. The text feature extraction module 208can then implement the algorithms necessary to populate these datastructures. Many other kinds of features can be inferred from the wordsand phrases in transcripts.

One way to infer features in the context of speech analytics is toclassify each conversation into one or more of a set of predefinedclasses. For example, the system 200 can tag each call to a call centerwith one or more reasons for the call, or tag each lecture in a serieswith cross-references to related topics. A system designer or theanalyst 226 can design the classes to be used. Automatic classifiersbuilt using machine learning techniques, such as boosting, can thenperform the classification.

Like other texts, conversation transcripts exhibit syntactic structure,dialogue structure, and discourse structure. Information about thesestructures can be very useful for speech retrieval and higher analyticsfunctionality. For example, if syntactic dependencies are recorded inthe database 218, the analyst 226 can search for all records where“widgets” depended on “purchase”, even if “widgets” was preceded by aquantifier like “two boxes of”. If dialogue structure is recorded in thedatabase, the analyst 226 can search for all records where the serviceagent (rather than the customer) used profanity. If discourse structureis recorded in the database, the analyst can search for all recordswhere the words “wrong number” occurred in the “Reason for Call” sectionof the conversation, as opposed to other sections. This task can be moredifficult for speech transcripts than other bodies of text becausetranscripts usually contain ASR errors, and because transcriptions lackorthographic clues such as case and punctuation. The impoverishedorthographic representation can necessitate customization of basicnatural language processing tools, such as part-of-speech taggers,named-entity taggers, co-reference resolvers, and syntactic parsers. Onthe other hand, speech contains acoustic information that can often beexploited to compensate for the shortcomings of transcripts. Approachesto exploit acoustic information are discussed below with regard to theacoustic feature extraction module 210.

Another type of language analysis is opinion extraction. This type ofanalysis can be particularly relevant for customer service call centers,where tracking and/or improving customers' opinions about a product orits producer is often a large part of a call center's entire purpose.However, some commonly employed opinion extraction techniques are muchless reliable when applied to noisy ASR output. One approach to avoidcompounding errors from ASR and opinion extraction is to transcribespeech directly into opinions without first transcribing into words. Inaddition to serving as search constraints, classification, structural,and opinion features can play an important role in trend analyses. Trendanalysis is discussed in more detail below.

The disclosure now turns to a more in-depth discussion of the acousticfeature extraction module 210. Acoustic information is whatdistinguishes speech analytics from text analytics. Some acousticfeatures can be used by a speech analytics system by themselves. Forexample, the acoustic feature extraction module 210 can classifyspeakers into different emotional states. Reliable information aboutcustomers' emotional states can greatly help an analyst to focus onproblematic calls in a call center or other environment. Similarly,reasons for customer dissatisfaction often correlate to the emotionalintensity of a call, as represented by pitch variance and loudness.

The system 200 can store acoustic features in the database. Speakersegmentation and classification is one example of where acousticinformation is used together with the transcript to infer additionaluseful features. If the system 200 knows which words were spoken byagents and which words by customers, then the system 200 can index thetwo sources of speech separately. Then, the analyst 226 can search forcalls where certain words were spoken specifically by one source and notthe other.

Speaker segmentation and classification is relatively straightforwardwhen customers and agents are recorded on separate channels, butunfortunately most call center equipment records both agents andcustomers on a single channel. In these cases, the speech analyticssystem 200 is forced to rely on automatic segmentation andclassification methods. In one implementation, these methods useinformation both from the acoustics and from the transcript. Thetranscript provides clues about where speaker turns start and end, suchas words that are often seen around these transition points. A languagemodel can also help to distinguish word sequences that a customer ismore likely to say from word sequences that a service agent is morelikely to say. Clues about speaker changes provided by acoustics includesharp changes in mean pitch over adjacent time windows (such as in atypical conversation between a male and a female), changes in formantdistributions, rising pitch contours which often indicate a question,falling pitch and energy near the end of declarative utterances, andlonger than average silences. All of these clues indicate a higherlikelihood of a speaker transition. The system 200 can take bothacoustic and text features into account for speaker segmentation andclassification.

Beyond speaker segmentation, the system 200 can apply acoustic and textfeatures in conversation segmentation. For example, in many callcenters, when an agent puts a customer on hold, the customer hearspre-recorded advertisements for one or more of the company's productsuntil the agent comes back on the line. These advertisements are notreally part of the conversation between the customer and the agent, butthey are nevertheless included in the recording of that conversation.The transcripts of these advertisements can be a problem if the analyst226 attempts to retrieve calls that mention one of the advertisedproducts. The analyst 226 is typically interested in calls where aproduct is mentioned by the customer or the agent, not in an ad. Sinceany given ad usually appears in many calls, the analyst 226 can beswamped with retrieved records where the product was mentioned only inan ad, making it difficult to find what the analyst 226 is reallylooking for. The speech analytics system 200 can segment the recordinginto ad and non-ad segments, and then filter out the ads.

In one implementation, conversation recordings include information aboutwhere ads begin and end. However in other implementations, thisinformation is unavailable due to cost, equipment, or other limitations.So, the speech analytics system 200 can find the ads automatically. Oneway to find these ads automatically is based on both acoustic andtranscript information. In one example of the acoustic side, voices inads may vary their pitch much more than the voices of agents orcustomers. In an example of the transcript side, because any given adappears in many calls, the n-grams that constitute an ad will have muchhigher frequency, on average, than the n-grams surrounding the ad.

The disclosure now turns to a more in-depth discussion of the relationalfeature extraction module 216. The job of the relational featureextraction module 216 is to convert metadata 214 into a form suitablefor storing in a relational DBMS 212, 218. Most of the metadata 214associated with conversations in the speech analytics system 200 isatomic, in the sense that it does not represent anything beyond itself.For example, calls to a call center come with information about whocalled, who answered, and when the conversation started and ended. Thus,most of the work of the relational feature extraction module 216includes adding these unstructured pieces of data to separate fields inthe relevant database records.

Some metadata is not atomic, however. For example, a customer IDattached to a call might be the key to a great deal of information thata company has about that customer. An agent or an analyst 226 might wellhave reason to search for calls from customers of a certain type. Forefficiency, the system 200 can add the customer information to the callrecord in advance. The relational feature extraction module 216 canperform a join operation between the call records and the customerrecords.

Further, the metadata 214 can hierarchically structured. One example isinformation about a company's employees, who are usually organized in ahierarchical reporting structure. A customer service agent reports to asupervisor, who reports to a floor manager, who reports to a call centermanager, who reports to a VP, who reports to the CEO. An analyst 226interested in the effectiveness of managers at different levels of thehierarchy can analyze calls taken by all the agents that report to acertain supervisor, or calls taken by all the agents whose supervisorsreport to a certain manager, and so forth. In one aspect, the analyst226 can compare two sets of calls by agents reporting to differentsupervisors, some of which may be overlapping if certain employeesreport to multiple supervisors. To support such queries, the relationalfeature extraction module 216 can flatten the hierarchy by means oftransitive closure. In other words, the relational feature extractionmodule 216 can create a separate database table that records every pairof employees that occur on some shortest path from the root to a leaf inthe company hierarchy.

The disclosure now turns to a more in-depth discussion of the databasemanagement system (DBMS) 218. The DBMS 218 performs two tasks. On theserver side, the DBMS 218 searches the database to find records thatsatisfy the analyst's 226 query 224. On the client side, the DBMS 218enables the analyst 226 to easily formulate the query 224, to understandthe results 246, and to iteratively refine the query based on theresults.

The database 212 can be relational, except for the text index, or anyother type of database. However, relational databases are commonly usedbecause they are mature and efficient enough for the kind of data inmost speech analytics systems. The database can be searched via standardor other query languages such as SQL, which include text searchprimitives and text indexing facilities. Queries about text fields canbe combined with queries about other fields in relational DBMSs. Thereal challenge is to design an intuitive user interface for the analyst226, which hides the complexities of the database from the analyst 226without limiting the power and flexibility of the database. Relationalquery languages can include syntax for specifying a set of constraintson the values of fields in the database's records, a set of fields todisplay from the records that satisfy the constraints, and a method forordering those records, as well as other commands.

As mentioned above, some query languages provide part or all of thefirst layer of analytics functionality. For example, using nothing butSQL on the command line, a skilled analyst 226 can group the recordsreturned by the DBMS 218 on the values of certain fields, count thenumber of records in each group, sum or average the numeric fieldswithin each group, and then sort on one of those sums or averages, tofind the groups with the smallest or largest values in those fields. Thebasic operations involved in this analysis are grouping, counting,summing, averaging, and sorting. These operations fall under thedefinition of analytics, because they aggregate information frommultiple conversations. Some speech analytics systems do not allowanalysts 226 to formulate queries 224. Instead, queries 224 areformulated in advance by the designers of the system 200, based on theirunderstanding of what analysts 226 might need. The system 200 can thenproactively issue the queries to the DBMS 218 on a recurring basis fortrend analysis and alerting.

The disclosure now turns to a more in-depth discussion of the mediaserver and player 220. Once the analyst 226 finds a set of conversations202 that match the search criteria in the query 224, the analyst 226 maywant to listen to some of them. To support such audio browsing, thedetail report can include hyperlinks to the relevant audio files. If ananalyst clicks one of these hyperlinks, an audio player 220 is launchedto play back the corresponding conversation.

The difficulty with audio browsing is that an analyst 226 cannot skimaudio files the way that he or she can skim text or hypertext files.Without special tools, the analyst 226 has no way to skip to the partsof an audio file that pertain to a certain topic. When dealing with longconversations, the analyst 226 can waste a significant quantity of timelistening to speech in which he or she is not interested.

When speech browsing follows speech retrieval, the system 200 can usethe query 224 used for retrieval, together with the transcripts 206, toameliorate this problem. Specifically, if the query 224 includes one ormore search terms, then the analyst 226 can skip directly to the part(s)of a conversation where those terms were spoken. Several techniques canmake this possible. First, the transcripts 204 can include informationabout the time offset of each word that they contain. Second, the mediaplayer 220 can start playing an audio source from a specific timeoffset. Third, since the audio files that contain long conversations canbe very large, the media server can take a long time to stream to themedia player. So, for optimum effectiveness, the media server should beable to start streaming audio from a specified time offset.

An exemplary audio player can make it easy for an analyst 226 to skip toinstances of search terms in a recording. The audio player can includesome components that exist in many other software audio players, such asa progress bar, a Play/Pause button, and playback speed control.However, the audio player can include additional features specificallydesigned for audio browsing. For example, the analyst 226 can clickanywhere in the progress bar to play the recording from that point. Thisfeature is particularly useful when the progress bar indicates thelocations of the search terms in the transcript, such as with verticaltic marks. The audio player can display a list of word contexts wherethe search terms are found in the transcript. The analyst 226 can clickon any of these contexts to listen to the corresponding audio segment. ACC button can turn on closed-captioning, which displays a moving windowof words in the transcript synchronized with the audio playback. Closedcaptioning can be helpful for following the recording, especially whenfast-forwarding.

The disclosure now turns to a more in-depth discussion of the featuresand abilities of the trend analysis module 236. The ability to performtrend analysis is one distinction between sophisticated speech analyticssystems and simple ones. Existing approaches are designed to analyzeonly one particular trend. However, different analysts 226 might want totrack different trends in the same database. For example, a salesanalyst might want to track the number of conversations that mention thename of a particular product, whereas a call center manager might wantto track the customer service ratings of different agents. The trendanalysis module 236 disclosed herein is more useful and can beconfigured to analyze a large variety of different trends. This kind ofconfigurable module requires a set of constraints on the records to beused in the analysis, a set of constraints on the candidate features240, and an objective function 234 with respect to which trends will beanalyzed. These three items are discussed below, in turn.

The trend analysis module 236 compiles a query out of the recordconstraints. The trend analysis module 236 sends this query to the DBMSto obtain a subset of records from the database, just as if the analystwere doing retrieval. However, the system 200 sends the set of records246 returned by the DBMS 218 to the feature filtering module 242,instead of being displayed to the analyst.

Almost any aspect of a conversation 202 or its metadata 214 can be afeature, such as the frequency of certain words in the conversation, orwhether any part of the conversation was through a cell phone. In thesimplest scenario, the set of feature constraints 240 is empty, and allavailable features of each record in the set are used, includingtranscript features, acoustic features, metadata features, andhigher-order features inferred from them. In this scenario, the featurefiltering module 242 filters out nothing. More typically, however, theanalyst 226 knows in advance that some features or types of featureseither are not relevant or are not likely to be relevant to theanalysis, because common sense tells them that those features are notcausally related to the objective function 234. In this case, theanalyst 226 can remove some features or feature types fromconsideration.

The objective function 234 can be one of the features, or a mathematicalfunction of one or more features. For example, customer satisfaction isan important objective function for most businesses, and consequentlyalso for their speech analytics systems. In order to track callersatisfaction with a particular agent, product, or other aspect of theirrelationship with the company, call centers often give callers theoption to take a survey at the end of the call. Such surveys typicallyelicit scalar responses, using questions such as “How would you rate theagent's courtesy, on a scale from 1 to 5?” The answers to the surveyquestions then become part of the call's metadata, and are available forstatistical modeling. More generally, an objective function 234 cancombine two or more features. For example, a call center analyst 226 cananalyze the summed frequency of all swear words over some time period.

Given these three kinds of information, the trend analysis module 236induces a model of how the objective function 234 can be explained bythe selected features 244. The model assigns a weight to each inputfeature 244, which represents that feature's correlation with theobjective function 234. The trend analysis module 236 ranks the featureson the magnitude of their weights, and presents one or more of thehighest-ranked features, or prominent features 238, to the analyst 226.Statistical regression is one possible way to assign weights tofeatures. For a regression problem that includes a large number offeatures, such as the vocabulary of speech transcripts, it is importantto choose a method with built-in regularization. The purpose ofregression in this application is not the usual purpose. Usually, usersinduce regression models for the purpose of predicting the values of theobjective function on future examples. In contrast, this applicationinduces the regression model to determine the weights of the features,rather than in what the features say about any particular examples.

Regardless of how the feature weights are induced, speech analyticssystem designers should be aware that correlation is not the same ascausation. High customer satisfaction ratings might correlate highlywith the phrase “thank you” in conversation transcripts, but that phraseis probably not the cause of customer satisfaction. In fact it is morelikely to be the other way around. The approaches disclosed hereinprovide a reliable and efficient tool for an analyst to determinecorrelation and causation among a large number of variables.

As with the other speech analytics functionalities that involve ananalyst 226, trend analysis 236 is only as effective as the userinterface. A well-designed user interface offers the analyst 226 an easyway to focus on records with certain highly predictive features. Theinterface can also offer an easy way to filter out features 244 and/orfeature types in an iterative manner, in order to remove those that theanalyst 226 decides are unlikely to be the cause of variance in theobjective function 234. FIG. 3 illustrates an exemplary trend analysisgraphical user interface 300. To get to this part of the system, theanalyst 226 specifies a set of search constraints. Then, the analyst 226can select an objective function 234 from a drop-down menu (not shown),and click a separate analyze button. In this example, the objectivefunctions 234 are limited to individual numeric fields, but can beexpanded to include other options. FIG. 3 illustrates the result of atrend analysis with respect to the Score field, which in this contextrepresents a measure of customer satisfaction. The display is divided intwo columns. The left column shows the five features 302, 306, 310, 314,318 that are the most negatively correlated with the objective function,and the right column shows the five features 304, 308, 312, 316, 320that are the most positively correlated. Both columns are sorted bymagnitude from the top down, but can be presented in other arrangementsas well. In this example, CTN in feature 306 stands for callingtelephone number, Q1 and Q3 in features 318, 320 refer to customerresponses to survey questions, ANI in feature 308 stands for automaticnumber identification, and FCR in feature 316 stands for first callresolution. These features are clues of where to look for a particularproblem or for the source of a particular good trend. For example, a bador incorrect training session in one region's call center can lead tobad customer survey results or poor sales in that region. These featurescan reveal evidence pointing to the bad or incorrect training as thesource of the problem.

For each feature, the interface 300 gives a human-readable description,the correlation value, a bar graph to help the analyst 226 visuallycompare correlation magnitudes, and check boxes, not shown. The analyst226 can use check boxes, along with the buttons 322, 324, 326, 328 atthe bottom of the interface 300, for iterative refinement of theanalysis. One example check box for each feature is an Ignore check box.If the analyst 226 checks this box for one or more of the displayedfeatures and clicks the Analyze button 328 again, then the systemrepeats the analysis without those features. Thus, other features willmake it into the list of top five most highly correlated features, andwill be displayed instead of the featured that were ignored. Anotherexample check box is a Focus check box. If the analyst 226 checks thisbox for one or more of the displayed features and clicks the Analyzebutton 328, the system repeats the analysis just for the subset ofrecords whose values on those features contributed to the magnitude ofthe correlation. For scalar features and positive correlations this canmean having a value above the mean, and for negative correlations avalue below the mean. For Boolean features, this can mean having a valueof True for positive correlations and False for negative correlations.Of course, users can also check a combination of Focus and Ignore checkboxes, in which case the reanalysis focuses on a subset of the recordswhile ignoring certain features.

After checking some check boxes, the analyst 226 can click the Searchbutton 326 instead of the Analyze button 328. The system 200 returns theinterface to a search and retrieval interface, and immediately executesa search for the records that satisfy the conjunction of the Focusconstraints and the originally specified search constraints. In thismanner, the analyst can seamlessly switch between the retrieval andtrend analysis functionality layers of the system.

In one approach to trend analysis, time is the objective function 234.When the analyst 226 chooses time as the objective function 234, theanalyst 226 is effectively asking “What features change the most overtime?” Examples of possible answers to this question that might interestthe analyst 226 include the average customer satisfaction rating, thevolume of calls from Kansas, and the frequency of the phrase “cancel myservice”. When such answers come up, they often reveal problems thatwere previously unknown, giving analysts or their organization an earlyopportunity to address them.

The time objective function 234 is sometimes misused when the objectivefunction 234 of interest is actually something else. If the analyst 226wants to determine why customer satisfaction ratings fluctuate overtime, and they suspect that the answer lies in conversation transcripts,the analyst 226 can run trend analysis with time being the objectivefunction and the features taken from the transcripts, to see what comesout on top. Such an analysis can sometimes reveal valuable clues, butthere is a more direct and more reliable way to do it. What the analyst226 really wants to do is feature selection with respect to twoobjective functions: the ratings and time. Existing speech analyticssystems do not offer this functionality, which is a major reason why thetime objective function is often misused.

Another example of two objectives that can be usefully analyzed togetheris customer satisfaction ratings and call durations. Most call centersstrive to maximize the former while minimizing the latter. These goalsare difficult to achieve simultaneously, because the values of these twostatistics tend to rise and fall together. However, the correlation isfar from perfect, and there are typically many ways to influence the twostatistics disproportionally. The analyst 226 can run trend analysiswith these two objective functions to discover features that affect onestatistic more than the other does.

From a mathematical point of view, feature selection with two objectivefunctions is a generalization of feature selection with one objectivefunction 234. Instead of asking which features best explain the varianceof one objective, the approach disclosed herein asks which features bestexplain the covariance of two objectives. Multivariate regression is onetype of statistical regression for answering such questions. Naturally,these algorithms can be further generalized to three or more objectivefunctions, but it is more difficult to imagine useful cases for suchgeneralizations in a speech analytics system. The trend analysis GUI canbe generalized to accommodate two or more objective functions. Forexample, the GUI can substitute the pull down menu of objectivefunctions with a set of check boxes so that the analyst 226 can selectany number of functions.

The disclosure now turns to a more in-depth discussion of the featuresand abilities of the alerting system module 230. Effective alerting is along-sought-after goal of analytics systems, especially an analyticssystem that automatically generates an alert whenever an importantchange occurs in the data. Alerting systems necessarily deal with timeseries, since their output would always be the same if time is notconsidered. For the purposes of speech analytics, the system isconcerned with discrete time series, since records in the database arediscrete. One of the inputs to the alerting system is a time intervalsize. Each element of a time series represents records having timestampsthat fall into the same interval of the given size. Independent of theinterval size is the interval offset, which can greater than, equal to,or less than the interval size. If the interval offset is equal to theinterval size, then the intervals are disjoint and if the intervaloffset is less than the interval size, they overlap. The alerting systemcan analyze many time series.

The process of analyzing a large number of events for anomalies issusceptible to false alarms. False alarms reduce the effectiveness ofalerting systems, because they dilute the attention of the analyst 226.Even when sophisticated statistical methods are employed to reduce thefalse alarm rate, the analyst 226 can still restrict the records thatparticipate in the alerting module using a query, the same way as theywould for retrieval or trend analysis. The analyst 226 can alsoconfigure the alerting system module 230 to analyze time series for ahand-picked set of features, or for all the features of a given type,such as all the customer service agents, or the 1,000 most frequentwords in the vocabulary.

The system 200 can derive the numeric value corresponding to eachelement in a time series by summing and/or averaging the value of oneparticular feature of the relevant records, such as the frequency of aparticular word or call duration. However, the system 200 can alsoderive the numeric values from arbitrary mathematical transformations oraggregations of the chosen features. One possibility is to construct atime series of the weights that the trend analysis module 236 assigns toone or more features 238 with respect to an objective function 234 ofinterest. Such a time series would enable alerts such as “agent IDabc123 has become highly correlated with increased customer serviceratings.” To check for such an alert condition, the alerting systemmodule 230 can call the trend analysis module 236 to get featureweights. Having defined some time series, the analyst 226 can choosebetween alerts about the values of time tics, and alerts about changesin those values over time. In the case of changes over time, the analyst226 can specify a length of time against which to compare. For example,one analyst wants to be notified whenever the mean call duration exceeds10 minutes, another analyst wants to be notified whenever the mean callduration significantly increases from one day to the next, and a thirdanalyst wants to be notified whenever the mean call duration for a givenday is significantly higher than the mean for the preceding hundreddays. What counts as a significant increase or decrease can be expressedin absolute terms, such as 2 minutes, or in relative terms, such as 20%less or more than an average value. After deciding between individualvalues and changes of values, and, if applicable, between absolute andrelative changes, the analyst 226 can also supply the threshold value,such as 10 minutes or 20%.

The analyst 226 can also specify how often and under what conditions thealerting system module 230 is to notify the analyst 226 that an alertwas triggered. In the simplest scenario, the alerting system module 230sends alerts 232 to the analyst via an email or SMS message whenever oneof the alerts is triggered. To prevent a flood of alerts, the analystcan instead request that the alerting system module 230 send alerts 232no more than once per hour, per day, or per week, for example.Alternatively, the alerting system module 230 can publish the alerts 232on an alert page or dashboard whenever the analyst 226 logs into thesystem 200. In yet another variation, the alerting system module 230 cancompile a report of alerts 232 over a given interval and send the reportto the analyst. In some cases, the report can include a status report ofnon-alert events, such as “call duration—normal”, to reassure theanalyst 226 that those non-alert events are within normal tolerances.The analyst 226 can set certain exceptions to these general guidelinesfor the alerting system module 230 to handle critical or highlyimportant alerts appropriately. For example, the analyst 226 can set anexception that if customer satisfaction drops below a certain threshold,then the alerting system module 230 is to send an alert 232 immediately.In some cases, the alerting system module 230 sends multiple alerts 232to multiple individuals and/or over multiple media for a single trigger.Another possible degree of flexibility is to notify the analyst 226 onlywhen some conjunction of alert conditions is triggered.

The specification of a trigger in a fully flexible alerting system caninclude a time interval size, a time interval offset, a set of recordconstraints, a set of feature constraints, an objective function (sum,average, weight, etc.), choice between analyzing individual values orchanges over time, if analyzing changes then a choice between absoluteor relative changes, if analyzing relative changes then a length of timeto compare against, a threshold of significance, which alerts orconjunctions of alerts should generate a notification to the analyst,and/or the minimum delay between consecutive notifications.

Having disclosed some basic system components, the disclosure now turnsto the exemplary method embodiment for performing trend analysis asshown in FIG. 4. For the sake of clarity, the method is discussed interms of an exemplary system 100 such as is shown in FIG. 1 configuredto practice the method. First, the system 100 receives, as part of aspeech trend analysis request, a set of candidate feature constraints,an objective function with respect to a speech trend to be analyzed, anda set of speech record constraints to be applied to a group of speechrecords (402). The group of speech records can include recorded speech,transcriptions, and/or metadata. The system 100 can use the set ofcandidate feature constraints to constrain transcript features, metadatafeatures, and/or higher order features inferred from the transcriptfeatures and metadata features. In one aspect, the set of candidatefeature constraints is an empty set, meaning that all features areallowed. The objective function can be a mathematical function, a singlefeature, a combination of features, and/or time.

The system 100 selects a subset of speech records from the group ofspeech records based on the set of speech record constraints to yieldselected speech records (404). The system 100 can select the subset ofspeech records by generating a database query command corresponding tothe speech record constraints, and executing the database query command.The system 100 can then pass the results of executing the database querycommand to a feature filtering module which identifies the features inthe selected speech records. In one aspect, the system 100 does notdisplay these results to a user.

The system 100 identifies features in the selected speech records basedon the set of candidate feature constraints to yield identified features(406), assigns a weight to each of the identified features based on theobjective function (408), and ranks the identified features by theirrespective weights to yield ranked identified features (410). In oneaspect, the system outputs the ranked identified features via a userinterface in which a user can focus on records with highly predictivefeatures. In other aspects, these ranked identified features are fed toanother module or system without output to a user. For example, thesystem 100 can feed these ranked identified features to an alertingsubsystem or a report generator that generates a feature summary fordisplay to the user without displaying the actual features summarized.

The system 100 outputs at least one of the ranked identified featuresassociated with a speech-based trend in response to the speech trendanalysis request (412). After outputting at least one of the rankedidentified features, a user can revise the speech trend analysis requestby filtering out one or more feature constraint from the set ofcandidate feature constraints. This approach allows the user to refineand drill down to a particular set of features related to thespeech-based trend. Examples of this are discussed above with respect toFIG. 3.

The disclosure now turns to the exemplary method embodiment forgenerating an alert based on speech analytics data as shown in FIG. 5.This method is also discussed in terms of the exemplary system 100 shownin FIG. 1. The system 100 generates elements of a time series, whereineach element comprises speech records having timestamps within a sametime interval (502). The user can specify the size of the time interval.Further, the elements of the discrete time series can be separated by aninterval offset which can be user specified or predefined.

The system 100 generates a numeric value for each element in the timeseries based on a weight for each speech record, wherein the weight isbased on a set of candidate feature constraints, an objective functionwith respect to a trend to be analyzed, and a set of record constraintsto be applied to a group of records (504). The system can generate thenumeric value for each element in the time series by summing respectivescores for multiple features or averaging scores for multiple features,for example.

The system 100 generates an alarm when at least one respective numericvalue for at least one element in the time series meets a threshold(506). The system 100 can also notify a user when a specific combinationof alarms is generated. The threshold can be an absolute threshold or arelative threshold compared to other elements in the time series.

The principles disclosed herein can be used to construct a sophisticatedspeech analytics system, with several layers of functionality. Thelowest layer is a system for speech retrieval and browsing. On top ofthat, a DBMS provides rudimentary relational analytics. Next, a trendanalysis layer adds more sophisticated statistical analyses. Finally, analerting system collects information from the other layers, and takesthe initiative to provide actionable information in a timely manner. Thesystem presents all of the layers to an analyst in an intuitive andintegrated GUI.

Embodiments within the scope of the present disclosure may also includetangible and/or non-transitory computer-readable storage media forcarrying or having computer-executable instructions or data structuresstored thereon. Such non-transitory computer-readable storage media canbe any available media that can be accessed by a general purpose orspecial purpose computer, including the functional design of any specialpurpose processor as discussed above. By way of example, and notlimitation, such non-transitory computer-readable media can include RAM,ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storageor other magnetic storage devices, or any other medium which can be usedto carry or store desired program code means in the form ofcomputer-executable instructions, data structures, or processor chipdesign. When information is transferred or provided over a network oranother communications connection (either hardwired, wireless, orcombination thereof) to a computer, the computer properly views theconnection as a computer-readable medium. Thus, any such connection isproperly termed a computer-readable medium. Combinations of the aboveshould also be included within the scope of the computer-readable media.

Computer-executable instructions include, for example, instructions anddata which cause a general purpose computer, special purpose computer,or special purpose processing device to perform a certain function orgroup of functions. Computer-executable instructions also includeprogram modules that are executed by computers in stand-alone or networkenvironments. Generally, program modules include routines, programs,components, data structures, objects, and the functions inherent in thedesign of special-purpose processors, etc. that perform particular tasksor implement particular abstract data types. Computer-executableinstructions, associated data structures, and program modules representexamples of the program code means for executing steps of the methodsdisclosed herein. The particular sequence of such executableinstructions or associated data structures represents examples ofcorresponding acts for implementing the functions described in suchsteps.

Those of skill in the art will appreciate that other embodiments of thedisclosure may be practiced in network computing environments with manytypes of computer system configurations, including personal computers,hand-held devices, multi-processor systems, microprocessor-based orprogrammable consumer electronics, network PCs, minicomputers, mainframecomputers, and the like. Embodiments may also be practiced indistributed computing environments where tasks are performed by localand remote processing devices that are linked (either by hardwiredlinks, wireless links, or by a combination thereof) through acommunications network. In a distributed computing environment, programmodules may be located in both local and remote memory storage devices.

The various embodiments described above are provided by way ofillustration only and should not be construed to limit the scope of thedisclosure. For example, the principles herein can be applied in callcenter analytics as well as any other database of live and/or recordedspeech. Those skilled in the art will readily recognize variousmodifications and changes that may be made to the principles describedherein without following the example embodiments and applicationsillustrated and described herein, and without departing from the spiritand scope of the disclosure.

We claim:
 1. A method comprising: generating, via a processor, elementsof a time series, wherein each element comprises speech records havingtimestamps within a same time interval; generating, via the processor, anumeric value for each element in the time series according to: (1) aweight for each speech record, wherein the weight is calculated using aset of candidate feature constraints; (2) an objective function withrespect to a trend to be analyzed; and (3) a set of record constraintsto be applied to a group of records; and generating, via the processor,an alarm when a value for at least one element in the time series meetsa threshold.
 2. The method of claim 1, wherein the weight is furthercalculated using a statistical regression.
 3. The method of claim 1,wherein a user specifies a size of the same time interval.
 4. The methodof claim 1, wherein the elements of the time series are separated by aninterval offset.
 5. The method of claim 1, wherein the numerical valuefor each element in the time series is further generated by summingrespective scores for a plurality of features.
 6. The method of claim 1,further comprising notifying a user when a specific combination ofalarms is generated, the specific combination of alarms including thealarm.
 7. The method of claim 1, wherein the threshold is relative toother elements in the time series.
 8. A system comprising: a processor;and a computer-readable storage medium having instructions stored which,when executed by the processor, cause the processor to performoperations comprising: generating elements of a time series, whereineach element comprises speech records having timestamps within a sametime interval; generating a numeric value for each element in the timeseries according to: (1) a weight for each speech record, wherein theweight is calculated using a set of candidate feature constraints; (2)an objective function with respect to a trend to be analyzed; and (3) aset of record constraints to be applied to a group of records; andgenerating an alarm when a value for at least one element in the timeseries meets a threshold.
 9. The system of claim 8, wherein the weightis further calculated using a statistical regression.
 10. The system ofclaim 8, wherein a user specifies a size of the same time interval. 11.The system of claim 8, wherein the elements of the time series areseparated by an interval offset.
 12. The system of claim 8, wherein thenumerical value for each element in the time series is further generatedby summing respective scores for a plurality of features.
 13. The systemof claim 8, the computer-readable storage medium having additionalinstructions stored which, when executed by the processor, cause theprocessor to perform operations comprising notifying a user when aspecific combination of alarms is generated, the specific combination ofalarms including the alarm.
 14. The system of claim 8, wherein thethreshold is relative to other elements in the time series.
 15. Acomputer-readable storage device having instructions stored which, whenexecuted by a computing device, cause the computing device to performoperations comprising: generating elements of a time series, whereineach element comprises speech records having timestamps within a sametime interval; generating a numeric value for each element in the timeseries according to: (1) a weight for each speech record, wherein theweight is calculated using a set of candidate feature constraints; (2)an objective function with respect to a trend to be analyzed; and (3) aset of record constraints to be applied to a group of records; andgenerating an alarm when a value for at least one element in the timeseries meets a threshold.
 16. The computer-readable storage device ofclaim 15, wherein the weight is further calculated using a statisticalregression.
 17. The computer-readable storage device of claim 15,wherein a user specifies a size of the same time interval.
 18. Thecomputer-readable storage device of claim 15, wherein the elements ofthe time series are separated by an interval offset.
 19. Thecomputer-readable storage device of claim 15, wherein the numericalvalue for each element in the time series is further generated bysumming respective scores for a plurality of features.
 20. Thecomputer-readable storage device of claim 15, having additionalinstructions stored which, when executed by the computing device, causethe computing device to perform operations comprising notifying a userwhen a specific combination of alarms is generated, the specificcombination of alarms including the alarm.